-50- 



Docket: SUN03-05(030036) 



is claimed is: 

A processing device to support parallel execution of multiple instructions, the 
processing device comprising: 

a trace detector that identifies traces in a segment of code including 
successive instructions, each of multiple identified traces in the segment of code 
including a set of instructions capable of being executed on an execution unit; and 

a dependency detector that, prior to parallel execution of multiple 
identified traces on corresponding execution units, analyzes the traces identified 
in the segment of code to determine a dependency order for executing the traces, 
the dependency order identifying at least one of the traces associated with the 
segment of code that cannot be properly executed in parallel with another trace in 
the segment of code. 

A processing device as in claim 1, wherein the traces each include a sequence of 
contiguous instructions intended to be executed successively in time and the 
dependency order indicates which of the multiple traces must be executed before 
others identified in the segment of code. 

A processing device as in claim 2 further comprising: 

a scheduler that schedules parallel execution of traces detected within a 
basic block of JAVA code on multiple execution units according to the 
dependency order. 

A processing device as in claim 1, wherein the trace detector identifying traces in 
the segment of code includes identifies operand stack dependencies associated 
with portions of the segment of code and wherein the corresponding execution 
units each include an operand stack. 
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A processing device as in claim 1, wherein the dependency detector analyzes the 
traces to determine data dependencies associated with traces in the segment of 
code and identify the dependency order for executing at least some of the traces in 
parallel at run time. 

A processing device as in claim 1 further comprising: 

multiple execution units to execute multiple traces in parallel based on the 
dependency order; 

a buffer to temporarily store results associated with execution of multiple 
executed traces; and 

a comparator circuit that, at run time of executing the multiple traces in 
parallel, identifies an out-of-order memory dependency condition associated with 
parallel executed traces resulting in an error; and 

the comparator circuit, in response to identifying the out-of-order memory 
dependency condition: 

squashes execution of latter traces in the segment of code that 

depend on results from earlier traces; 

clears results in the temporary buffer associated with the squashed 

traces; and 

reschedules squashed traces for later execution. 

A processing device as in claim 1 further comprising: 

multiple execution units that execute multiple traces in parallel based on 
the dependency order; 

a temporary buffer to store results associated with execution of multiple 
traces of the segment of code; and 

a comparator circuit to identify that an out-of-order memory dependency 
condition associated with parallel executed traces did not occur at run time of 
executing the multiple traces in parallel, the comparator circuit, in response to 
identifying that an out-of-order memory dependency condition did not occur at 
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run time of a particular trace, loads the results stored in the temporary buffer to 
memory after the particular trace completes execution. 

A processing device as in claim 1, wherein at least one of the traces is processed 
to include a folded bytecode instruction replacing a corresponding sequence of 
bytecode instructions. 

A processing device as in claim 1 further comprising: 

a fetcher that fetches multiple code instructions from different traces 
identified in the segment of code; 

a decoder that decodes the multiple fetched code instructions into 
corresponding bytecode instructions; and 

a buffer unit to store the bytecode instructions associated with the multiple 
decoded code instructions in corresponding trace buffers for each trace. 

A processing device as in claim 1 further comprising: 

a temporary buffer that stores results associated with execution of multiple 
traces; and 

wherein the comparator circuit identifies an out of order memory 
dependency condition based on a search for: 

i) a READ after a WRITE to the same memory address for different 
parallel executed traces, 

ii) a WRITE after a READ to the same memory address for different 
parallel executed traces, and 

iii) a WRITE after a WRITE to the same memory address for different 
parallel executed traces. 

A processing device as in claim 1, wherein the dependency detector analyzing the 
traces in the segment of code determines a dependency order by comparing 
memory access instructions in a first trace to memory access instructions in other 
traces identified in the segment of code to identify a potential trace dependency in 
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which the first trace contains a memory access instruction that depends on the 
operation of another memory access instruction in at least one of the other traces 
identified in the segment of code. 

A processing device as in claim 1, wherein the trace detector identifying traces 
within the segment of code: 

identifies a beginning trace instruction in the segment of code whose 
operation corresponds to a first clean condition of an execution unit; 

identifies a subsequent trace instruction in the segment of code whose 
operation corresponds to a non-clean condition of the execution unit; and 

identifies an ending trace instruction in the segment of code whose 
operation follows the first clean condition and the non-clean condition of the 
execution unit and that corresponds to at least one of: 

i) a second clean condition of the execution unit; and 

ii) an end of the segment of code; and 

designates, as a trace within the segment of code, all instructions in the 
segment of code including, and in-between, the beginning trace instruction and 
the ending trace instruction. 

A processing device as in claim 1, wherein the dependency detector identifying 
the dependency order, upon completion of execution of at least two traces, 
updates the dependency order to remove any trace dependencies associated with 
other non-executed traces that depended on completion of execution of the at least 
two executed traces. 

A method associated with parallel execution of multiple instructions, the method 
comprising: 

identifying traces in a segment of code including successive instructions, 
each of multiple identified traces in the segment of code including a set of 
instructions capable of being executed on an execution unit; and 
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prior to parallel execution of multiple identified traces on corresponding 
execution units, analyzing the traces identified in the segment of code to 
determine a dependency order for executing the traces, the dependency order 
identifying at least one of the traces associated with the segment of code that 
cannot be properly executed in parallel with another trace in the segment of code. 

A method as in claim 14, wherein the traces each include a sequence of 
contiguous instructions intended to be executed successively in time and the 
dependency order indicates which of the multiple traces must be executed before 
others identified in the segment of code. 

A method as in claim 15 further comprising: 

scheduling parallel execution of traces detected within a basic block of 
JAVA code on multiple execution units according to the dependency order. 

A method as in claim 14, wherein identifying traces in the segment of code 
includes identifying operand stack dependencies associated with portions of the 
segment of code and wherein the corresponding execution units each including an 
operand stack. 

A method as in claim 14, wherein analyzing the traces includes determining data 
dependencies associated with traces in the segment of code to identify the 
dependency order for executing at least some of the traces in parallel at run time. 

A method as in claim 14 further comprising: 

executing multiple traces in parallel based on the dependency order; 

temporarily storing results associated with execution of the multiple traces 
in a temporary buffer; and 

at run time, identifying an out-of-order memory dependency condition 
associated with parallel executed traces resulting in an error; and 

in response to identifying the out-of-order memory dependency condition: 
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squashing execution of latter traces in the segment of code that 
depend on results from earlier traces and clearing results in the temporary 
buffer associated with the squashed traces; and 

rescheduling squashed traces for later execution. 

A method as in claim 14 further comprising: 

executing multiple traces in parallel based on the dependency order; 

storing results associated with execution of multiple traces of the segment 
of code in a temporary buffer; and 

identifying that an out-of-order memory dependency condition associated 
with parallel executed traces did not occur at run time of executing the multiple 
traces in parallel; and 

in response to identifying that an out-of-order memory dependency 
condition did not occur at run time of a particular trace: 

storing the results stored in the temporary buffer to memory after 

the trace completes execution. 

A method as in claim 14, wherein at least one of the traces is processed to include 
a folded JAVA bytecode instruction replacing a corresponding sequence of JAVA 
bytecode instructions. 

A method as in claim 14 further comprising: 

fetching multiple code instructions from different traces identified in the 
segment of code; 

decoding the multiple fetched code instructions into corresponding 
bytecode instructions; and 

storing the bytecode instructions associated with the multiple decoded 
code instructions in corresponding trace buffers for each trace. 

A method as in claim 14 further comprising: 
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storing results associated with execution of multiple traces in a temporary 
buffer; and 

wherein identifying an out of order memory dependency condition 
includes searching for: 

i) a READ after a WRITE to the same memory address for different 
parallel executed traces, 

ii) a WRITE after a READ to the same memory address for different 
parallel executed traces, and 

iii) a WRITE after a WRITE to the same memory address for different 
parallel executed traces. 

A method as in claim 14, wherein analyzing the traces in the segment of code to 
determine a dependency order includes: 

comparing memory access instructions in a first trace to memory access 
instructions in other traces identified in the segment of code to identify a potential 
trace dependency in which the first trace contains a memory access instruction 
that depends on the operation of another memory access instruction in at least one 
of the other traces identified in the segment of code. 

A method as in claim 14, wherein identifying traces within the segment of code 
comprises: 

identifying a beginning trace instruction in the segment of code whose 
operation corresponds to a first clean condition of an execution unit; 

identifying a subsequent trace instruction in the segment of code whose 
operation corresponds to a non-clean condition of the execution unit; and 

identifying an ending trace instruction in the segment of code whose 
operation follows the first clean condition and the non-clean condition of the 
execution unit and that corresponds to at least one of: 

i) a second clean condition of the execution unit; and 

ii) an end of the segment of code; and 
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designating, as a trace within the segment of code, all instructions in the 
segment of code including, and in-between, the beginning trace instruction and 
the ending trace instruction. 

A method as in claim 14, wherein identifying the dependency order further 
comprises: 

upon completion of execution of at least two traces, updating the 
dependency order to remove any trace dependencies associated with other non- 
executed traces that depended on completion of execution of the at least two 
executed traces. 

A computer program product including a computer-readable medium having 
instructions stored thereon for processing data information, such that the 
instructions, when carried out by a processing device, enable the processing 
device to perform the steps of: 

identifying traces in a segment of code including successive instructions, 
each of multiple identified traces in the segment of code including a set of 
instructions capable of being executed on an execution unit; and 

prior to parallel execution of multiple identified traces on corresponding 
execution units, analyzing the traces identified in the segment of code to 
determine a dependency order for executing the traces, the dependency order 
identifying at least one of the traces associated with the segment of code that 
cannot be properly executed in parallel with another trace in the segment of code. 

A processing device to support parallel execution of multiple instructions, the 
processing device comprising: 

means for identifying traces in a segment of code including successive 
instructions, each of multiple identified traces in the segment of code including a 
set of instructions capable of being executed on an execution unit; and 

means for analyzing the multiple identified traces identified in the segment 
of code, prior to parallel execution of the multiple identified traces on 
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corresponding execution units, to determine a dependency order for executing the 
multiple identified traces, the dependency order identifying at least one of the 
traces associated with the segment of code that cannot be properly executed in 
parallel with another trace in the segment of code. 

A processing device to support parallel execution of multiple instructions, the 
processing device comprising: 

a fetcher to fetch instructions; 

a trace detector coupled to receive the fetched instructions, the trace 
detector identifying traces in a segment of code including successive instructions, 
each of multiple identified traces in the segment of code including a set of 
instructions capable of being executed on an execution unit; 

a dependency detector that, prior to parallel execution of multiple 
identified traces on corresponding execution units, analyzes the traces identified 
in the segment of code to determine a dependency order for executing the traces, 
the dependency order identifying at least one of the traces associated with the 
segment of code that cannot be properly executed in parallel with another trace in 
the segment of code; 

a trace scheduler coupled to the dependency detector and the trace 
detector, the trace scheduler receiving a set of traces and, based on the 
dependency order, causing the corresponding execution units to execute traces 
within the set of traces in parallel, the execution taking place in an execution 
order that is based on the identified dependency order, at least two traces being 
executed in parallel and if the dependency order indicates that a second trace is 
dependent upon a first trace, the first trace being executed prior to the second 
trace; and 

multiple execution units to execute the traces in parallel. 

A processing device as in claim 29 further comprising: 

a temporary buffer coupled to the execution units to store results 
associated with execution of multiple traces of the segment of code; and 
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a comparator circuit to detect whether an out-of-order memory 
dependency condition associated with parallel executed traces occurs at run-time 
of executing the multiple traces in parallel, the comparator circuit conditionally 
loading the results stored in the temporary buffer to memory after the particular 
trace completes execution. 

A processing device as in claim 30 further comprising a squash circuit coupled to 
receive a signal from the comparator circuit identifying detection of an out-of- 
order memory dependency condition, the squash circuit: 

squashing execution of latter traces in the segment of code that 
depend on results from earlier traces; 

clearing results in the temporary buffer associated with the 
squashed traces; and 

generates a signal to the trace scheduler to reschedule squashed 
traces for later execution. 

A processing device as in claim 29 further comprising: 

a basic block trace table cache to store trace information associated with a 
currently executed method. 

A processing device as in claim 32 further comprising: 

bytecode trace fetch logic that utilizes multiple program counters stored in 
the basic block trace table cache to order the fetcher to fetch multiple instructions 
from multiple locations of a method cache. 

A processing device as in claim 32 further comprising: 

a decoded bytecode trace buffer including individual buffers, each 
individual buffer storing instructions for a given trace. 

A processing device as in claim 34, wherein the trace scheduler: 
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identifies non-dependent traces based on the trace information in 
the basic block trace table cache; 

selects the set of traces to be executed on corresponding execution 

units; 

allocates execution units to execute the set of traces in parallel; and 
fetches the set of traces from the decoded bytecode trace buffer for 
parallel execution by the execution units. 

A processing device as in claim 29, wherein the execution units each includes an 
operand stack, a reservation station and an associated functional unit. 

A processing device as in claim 29, wherein the execution units each includes 
multiple sets of shared local variable registers, each set of local variable registers 
being utilized by a corresponding method. 

A processing device as in claim 29, wherein the execution units each includes: 

a load buffer and a store buffer to temporarily store retrieved and modified 
data associated with multiple parallel executed traces in a scratchpad area. 



